Multilingual Acquisition of Structured Information via Novel Relationship Extraction Models over Diverse Knowledge Sources
نویسنده
چکیده
This dissertation presents original techniques for a class of problems that can be collectively referred to as relationship extraction. This machine learning task involves extracting tuples from free text, the exemplar instantiations of which help model the target relationship. A wide range of relationships are explored, including semantic relationships between words, their translation equivalents in different languages and encyclopedic facts about named entities. This dissertation explores new relationship extraction models which exploit novel knowledge sources across a diverse set of relationship types in multiple languages. It ties together extraction of diverse relationships in the classic seed-based minimally supervised framework. However, this framework has previously failed to capture information beyond local context such as transitively-derived information, domain constraints and knowledge, correlations among relationships and additional novel knowledge sources. Furthermore, the traditional seed-based learning framework fails to extract non-overt relationships such as an author’s gender or age when they are not explicitly stated.In contrast, some of these non-overt relationships can be inferred
منابع مشابه
Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia
Wikipedia infoboxes are a valuable source of structured knowledge for global knowledge sharing. However, infobox information is very incomplete and imbalanced among the Wikipedias in different languages. It is a promising but challenging problem to utilize the rich structured knowledge from a source language Wikipedia to help complete the missing infoboxes for a target language. In this paper, ...
متن کاملGraph-Based Weakly-Supervised Methods for Information Extraction & Integration
The variety and complexity of potentially-related data resources available for querying --webpages, databases, data warehouses --has been growing ever more rapidly. There is a growing need to pose integrative queries across multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse sources. This has traditionally been the focus of resea...
متن کاملInformation Extraction from Biomedical Texts: Learning Models with Limited Supervision
Among the application domains of information extraction, the biomedical domain is one of the most important ones. This is due to the large amount of biomedical text sources including the vast scientific literature and collections of patient reports written in natural language. These sources contain a wealth of crucial knowledge that needs to be mined. Typical mining tasks regard entity recognit...
متن کاملA Composite Kernel to Extract Relations between Entities with Both Flat and Structured Features
This paper proposes a novel composite kernel for relation extraction. The composite kernel consists of two individual kernels: an entity kernel that allows for entity-related features and a convolution parse tree kernel that models syntactic information of relation examples. The motivation of our method is to fully utilize the nice properties of kernel methods to explore diverse knowledge for r...
متن کاملModern Multilingual and Cross-lingual Information Access Technologies
In this chapter, we describe the state of the art cross-lingual and multilingual strategies and their related areas. In particular, we show a WWW-based information system called MIETTA, which allows uniform and multilingual access to heterogeneous data sources in the tourism domain. The design of the search engine is based on a new cross-lingual framework. The framework integrates a cross-lingu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009